Search CORE

25 research outputs found

Loghub: A Large Collection of System Log Datasets towards Automated Log Analytics

Author: He Pinjia
He Shilin
Lyu Michael R.
Zhu Jieming
Publication venue
Publication date: 14/08/2020
Field of study

Logs have been widely adopted in software system development and maintenance because of the rich system runtime information they contain. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on intelligent log analytics powered by AI (artificial intelligence) techniques. However, only a small fraction of these techniques have reached successful deployment in industry because of the lack of public log datasets and necessary benchmarking upon them. To fill this significant gap between academia and industry and also facilitate more research on AI-powered log analytics, we have collected and organized loghub, a large collection of log datasets. In particular, loghub provides 17 real-world log datasets collected from a wide range of systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical log usage scenarios, and present a case study on anomaly detection to demonstrate how loghub facilitates the research and practice in this field. Up to the time of this paper writing, loghub datasets have been downloaded over 15,000 times by more than 380 organizations from both industry and academia.Comment: Dateset available at https://zenodo.org/record/322717

arXiv.org e-Print Archive

Retromorphic Testing: A New Approach to the Test Oracle Problem

Author: Guo Qingshuo
He Pinjia
Mang Qiuyang
Yu Boxi
Publication venue
Publication date: 10/10/2023
Field of study

A test oracle serves as a criterion or mechanism to assess the correspondence between software output and the anticipated behavior for a given input set. In automated testing, black-box techniques, known for their non-intrusive nature in test oracle construction, are widely used, including notable methodologies like differential testing and metamorphic testing. Inspired by the mathematical concept of inverse function, we present Retromorphic Testing, a novel black-box testing methodology. It leverages an auxiliary program in conjunction with the program under test, which establishes a dual-program structure consisting of a forward program and a backward program. The input data is first processed by the forward program and then its program output is reversed to its original input format using the backward program. In particular, the auxiliary program can operate as either the forward or backward program, leading to different testing modes. The process concludes by examining the relationship between the initial input and the transformed output within the input domain. For example, to test the implementation of the sine function

\sin(x)

, we can employ its inverse function,

\arcsin(x)

, and validate the equation

x = \sin(\arcsin(x)+2k\pi), \forall k \in \mathbb{Z}

. In addition to the high-level concept of Retromorphic Testing, this paper presents its three testing modes with illustrative use cases across diverse programs, including algorithms, traditional software, and AI applications

arXiv.org e-Print Archive

A Review of Modeling and Diagnostic Techniques for Eccentricity Fault in Electric Machines

Author: He Shan
Huang Jin
Liu Zijian
Zhang Pinjia
Publication venue: 'MDPI AG'
Publication date: 01/07/2021
Field of study

Research on the modeling and fault diagnosis of rotor eccentricities has been conducted during the past two decades. A variety of diagnostic theories and methods have been proposed based on different mechanisms, and there are reviews following either one type of electric machines or one type of eccentricity. Nonetheless, the research routes of modeling and diagnosis are common, regardless of machine or eccentricity types. This article tends to review all the possible modeling and diagnostic approaches for all common types of electric machines with eccentricities and provide suggestions on future research roadmap. The paper indicates that a reliable low-cost non-intrusive real-time online visualized diagnostic method is the trend. Observer-based diagnostic strategies are thought promising for the continued research

Directory of Open Access Journals

VBN

ROME: Testing Image Captioning Systems via Recursive Object Melting

Author: He Pinjia
He Shilin
Li Jiaqi
Yang Yixing
Yu Boxi
Zhong Zhiqing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/06/2023
Field of study

Image captioning (IC) systems aim to generate a text description of the salient objects in an image. In recent years, IC systems have been increasingly integrated into our daily lives, such as assistance for visually-impaired people and description generation in Microsoft Powerpoint. However, even the cutting-edge IC systems (e.g., Microsoft Azure Cognitive Services) and algorithms (e.g., OFA) could produce erroneous captions, leading to incorrect captioning of important objects, misunderstanding, and threats to personal safety. The existing testing approaches either fail to handle the complex form of IC system output (i.e., sentences in natural language) or generate unnatural images as test cases. To address these problems, we introduce Recursive Object MElting (Rome), a novel metamorphic testing approach for validating IC systems. Different from existing approaches that generate test cases by inserting objects, which easily make the generated images unnatural, Rome melts (i.e., remove and inpaint) objects. Rome assumes that the object set in the caption of an image includes the object set in the caption of a generated image after object melting. Given an image, Rome can recursively remove its objects to generate different pairs of images. We use Rome to test one widely-adopted image captioning API and four state-of-the-art (SOTA) algorithms. The results show that the test cases generated by Rome look much more natural than the SOTA IC testing approach and they achieve comparable naturalness to the original images. Meanwhile, by generating test pairs using 226 seed images, Rome reports a total of 9,121 erroneous issues with high precision (86.47%-92.17%). In addition, we further utilize the test cases generated by Rome to retrain the Oscar, which improves its performance across multiple evaluation metrics.Comment: Accepted by ISSTA 202

arXiv.org e-Print Archive